AI & Data Literacy by Bill Schmarzo

AI & Data Literacy by Bill Schmarzo

Author:Bill Schmarzo
Language: eng
Format: epub
Publisher: Packt
Published: 2023-11-15T00:00:00+00:00


Understanding probabilities and statistics

Making predictions about likely outcomes is a challenging task. As famously stated by Yogi Berra, “It’s tough to make predictions, especially about the future.” Accurate predictions rely on a nuanced understanding of probabilities, confidence levels, and confidence intervals.

Probability is a measure of the likelihood that a particular event will occur, typically expressed as a percentage (ranging from 0% to 100%). For example, examining Barry Bonds’ 2004 season with the San Francisco Giants, we can calculate the probability of him getting a hit as 36.2% (equivalent to 36.2 hits for every 100 at-bats).

Understanding probabilities is vital for assessing the likelihood of specific outcomes, equipping us with the necessary insights to make informed decisions. It is crucial to acknowledge that probabilities serve as estimates derived from available data and statistical analysis. While probabilities provide a framework for evaluating relative likelihoods, it is important to remember that they do not guarantee definitive outcomes. Therefore, to enhance the effectiveness of our predictions, it becomes imperative to harness the power of statistics.

Statistics is the practice or science of collecting and analyzing numerical data in large quantities, especially to infer proportions as a whole from those in a representative sample. By leveraging statistical techniques, we can analyze patterns, identify correlations, and uncover valuable insights that enable us to make more accurate and reliable predictions.

When using statistics to help us calculate probabilities and make predictions, we need to understand the statistical concepts of the mean (or average), variance, standard deviation, confidence intervals, and confidence levels. These are basic statistical concepts that everyone needs to understand in order to leverage statistics to make more informed decisions. Let’s define these basic concepts:

The mean or average is the sum of a collection of numbers divided by the count of numbers in the collection.

Variance measures the variability of the numbers or observations from the average or the mean of that same set of numbers or observations. Variance measures how dispersed the data is for the mean.

Standard deviation is simply the square root of the variance. A low standard deviation means data is clustered around the mean, and a high standard deviation indicates data is more spread out. A standard deviation near zero indicates that data points are close to the mean. In contrast, a high or low standard deviation indicates that data points are respectively above or below the mean.

The confidence interval is the range of values you expect your estimate to fall between for a certain percentage of the time if you rerun your experiment or re-sample the population similarly.

The confidence level is the percentage of time you expect to reproduce an estimate between the upper and lower bounds of the confidence interval.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.